Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use new Enso Hash Codes and Comparable #6060

Conversation

jdunkerley
Copy link
Member

@jdunkerley jdunkerley commented Mar 23, 2023

Pull Request Description

Enables distinct, aggregate and cross_tab to use the Enso hashing and equality operations.
Also, I rewired the way the ObjectComparators are obtained in polyglot code to be more consistent.

Add Comparator for Day_Of_Week, Header, SQL_Type, Image and Matrix.
Also, removed the custom == from these types as needed. (Closes #5626)

Checklist

Please ensure that the following checklist has been satisfied before submitting the PR:

  • The documentation has been updated, if necessary.
  • Screenshots/screencasts have been attached, if there are any visual changes. For interactive or animated visual changes, a screencast is preferred.
  • All code follows the
    Scala,
    Java,
    and
    Rust
    style guides. In case you are using a language not listed above, follow the Rust style guide.
  • All code has been tested:
    • Unit tests have been written where possible.
    • If GUI codebase was changed, the GUI was tested when built using ./run ide build.

@jdunkerley jdunkerley force-pushed the wip/jd/5046-investigate-the-possibility-of-passing-through-the-enso-map-to-unorderedmultivalueindex branch from 90b5d40 to 410cbdc Compare March 23, 2023 16:19
@jdunkerley jdunkerley added the CI: No changelog needed Do not require a changelog entry for this PR. label Mar 23, 2023
@jdunkerley jdunkerley marked this pull request as ready for review March 23, 2023 18:13
@jdunkerley jdunkerley requested a review from radeusgd as a code owner March 23, 2023 18:13
Copy link
Member

@JaroslavTulach JaroslavTulach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few comments, but I need to understand the changes more...

@@ -49,3 +53,14 @@ type Day_Of_Week
Day_Of_Week.Thursday -> DayOfWeek.THURSDAY
Day_Of_Week.Friday -> DayOfWeek.FRIDAY
Day_Of_Week.Saturday -> DayOfWeek.SATURDAY

## PRIVATE
type Day_Of_Week_Comparator
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks exactly as I have envisioned that when dreaming about Comparator design.

if Java_Image.is_equals x.opencv_mat y.opencv_mat then Ordering.Equal else
Nothing

hash x = x.opencv_mat.hashCode
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good. Delegating to Java hashCode when one knows there is a Java object is good.

import java.util.function.BiFunction;
import java.util.function.Function;

public class EnsoObjectWrapper implements Comparable<EnsoObjectWrapper> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make the class final. Anyway, I am not yet sure what EnsoObjectWrapper is good for...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's me put Enso objects into the MulitValueIndex and hence aggregate and distinct on them using the Enso hash codes and equality.

When @Akirathan has time to do a nicer integration can switch to it but now let's all the stuff work.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. #5855 was a good start for the integration, I believe.

@jdunkerley jdunkerley requested review from Akirathan and removed request for Frizi March 23, 2023 19:20
- custom_comparator:
If `Nothing` will get an ordered comparator from each element.
Otherwise can support a custom fallback compare function.
new : Nothing | (Any -> Any -> Ordering) -> ObjectComparator
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removal of this class is probably good.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is good indeed. I was always confused by the name.

private static void initCallbacks() {
if (EnsoCompareCallback == null) {
var module = Context.getCurrent().getBindings("enso").invokeMember("get_module", "Standard.Base.Data.Ordering");
var type = module.invokeMember("get_type", "Comparable");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This kind of poking around in Enso doesn't make me particularly happy.

However the only alternative I can offer is to always make sure each call from Enso to Java (that needs this callback) passes the compare_callback function to Java.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also don't like such a poking in Enso. But I guess it is OK for this PR, as a temporary solution.

Copy link
Member

@Akirathan Akirathan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall. I am glad that we got rid of Standard.Base.Data.Ordering.Comparator, as that name always confused me. Great that we will have the ability to aggregate over enso objects in columns, although the performance won't be good, but that we be tackled in time.

- custom_comparator:
If `Nothing` will get an ordered comparator from each element.
Otherwise can support a custom fallback compare function.
new : Nothing | (Any -> Any -> Ordering) -> ObjectComparator
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is good indeed. I was always confused by the name.

@Akirathan
Copy link
Member

Also, if you could please add, at least a very simple, benchmark to test/Benchmarks/src/Table/Aggregate.enso that aggregates over some enso objects in the columns, that would be very cool. I could do that myself in the future, but you are definitely more versed in the API of Table.

Copy link
Member

@radeusgd radeusgd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good as a workaround to get Enso objects comparisons/hashing supported in the Table library, although it's not what I'd imagine for the longer-term solution.

The logic for folding primitives (added to be consistent with how Enso compares them) is duplicated between the engine and our Table utilities - which may lead to inconsistency - but it does not have to be anymore as we can have this logic unified in common-polyglot-core-utils. I hope #5259 will be implemented in the not-very-far future to ensure this is consistent.

Comment on lines +40 to +42
shifted = case first_day of
Day_Of_Week.Sunday -> day_number
_ -> (day_number + 7 - (first_day.to_integer start_at_zero=True)) % 7
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
shifted = case first_day of
Day_Of_Week.Sunday -> day_number
_ -> (day_number + 7 - (first_day.to_integer start_at_zero=True)) % 7
shifted = (day_number + 7 - (first_day.to_integer start_at_zero=True)) % 7

btw. won't just that work? I'm not sure if we have to special-case the Sunday given that we do % later anyway.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stack overflow - to_integer would be called each time,

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the culprit is #6065.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right! Okay then let's keep it.

I guess it would have worked if we did:

        day_number day = case day of
            Day_Of_Week.Sunday -> 0
            Day_Of_Week.Monday -> 1
            Day_Of_Week.Tuesday -> 2
            Day_Of_Week.Wednesday -> 3
            Day_Of_Week.Thursday -> 4
            Day_Of_Week.Friday -> 5
            Day_Of_Week.Saturday -> 6
        shifted = ((day_number self) + 7 - (day_number first_day)) % 7

right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. This was only changed to avoid having == in it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was just wondering on a side if we could further simplify it, removing the possibility of the infinite loop altogether.

Comment on lines +1270 to +1274
wrapped a b = case a of
Nothing -> if b.is_nothing then Ordering.Equal else if missing_last then Ordering.Greater else Ordering.Less
_ -> case b of
Nothing -> if missing_last then Ordering.Less else Ordering.Greater
_ -> by a b
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like it could be a separate method that will be useful in other places:
make_comparator_nullsafe missing_last by = a-> b-> ...

Comment on lines -70 to 69
public int compare(Object thisValue, Object thatValue) throws ClassCastException {
public int compare(Object thisValue, Object thatValue) {
// NULLs
if (thisValue == null) {
if (thatValue != null) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, this should also be removed and delegate to some common method in common-polyglot-core-utils. But ofc ok as a workaround, just - I assume this will be implemented by @Akirathan in #5259 (although I see that this is not scheduled nor assigned to him - do we plan to do this anytime soon @jdunkerley @JaroslavTulach?).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code delegates to EnsoObjectWrapper and that class is using org.graalvm.polyglot.Context. We cannot/shouldn't use org.graalvm.polyglot.Context in the engine - e.g. I am afraid that isn't much to share in current implementation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand?

This is just a workaround that should be replaced with a proper solution. The workaround uses a Context, because it is its way to 'access' the current implementation and delegate to it (through a slow Java-to-Enso call).

The whole idea of extracting the shared code is so that the logic for computing the hash for particular primitive types can be exposed in pure Java, so that our library could call into it directly in Java, without the overhead of an Java-to-Enso call. That is what is described in #5259, although maybe I did not describe this well enough. If it's not clear - let's discuss.

Comment on lines 90 to 92
// We could do a fast-path for some known primitive types, but it doesn't matter as it will be
// replaced with hashing soon anyway.
return equalityFallback.apply(leftValue, rightValue);
return new EnsoObjectWrapper(leftValue).equals(new EnsoObjectWrapper(rightValue));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another place that will need to be addressed by #5259 - especially as here we are lacking the fast-path for primitives.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could move foldObject from MultiValueKey to something next to ObjectComparator and then delegate to such helper for equality? Then we would get the fast-path for primitives, which is pretty important.

Although I'm fine if we leave this as-is for now and handle the proper solution in #5259.

Comment on lines -752 to +765
Test.specify "until hashing is supported, should throw an error when trying to aggregate a custom object" <|
Test.specify "should be able to create distinct on Enso objects" <|
t = Table.new [["X", [My.Data 1 2, My.Data 3 4, My.Data 1 2]]]
t.distinct . should_fail_with Illegal_Argument
t.distinct ["X"] . at "X" . to_vector . should_equal [My.Data 1 2, My.Data 3 4]

t2 = Table.new [["X", [Day_Of_Week.Monday, Day_Of_Week.Tuesday, Day_Of_Week.Monday, Day_Of_Week.Monday, Day_Of_Week.Tuesday, Day_Of_Week.Wednesday]]]
t2.distinct ["X"] . at "X" . to_vector . should_equal [Day_Of_Week.Monday, Day_Of_Week.Tuesday, Day_Of_Week.Wednesday]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should add tests like this to Table.join too.

Also, ideally would also test some My_Atom that has a custom comparator defined.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can just use My type - there is custom_objects column here, tested on line 844.

@jdunkerley jdunkerley added the CI: Ready to merge This PR is eligible for automatic merge label Mar 24, 2023
@mergify mergify bot merged commit 58f2c76 into develop Mar 24, 2023
@mergify mergify bot deleted the wip/jd/5046-investigate-the-possibility-of-passing-through-the-enso-map-to-unorderedmultivalueindex branch March 24, 2023 15:02
Procrat added a commit that referenced this pull request Mar 27, 2023
* develop:
  Layout fixes (#6066)
  Use new Enso Hash Codes and Comparable (#6060)
  Search suggestions by static attribute (#6036)
  Use .node-version for pinning Node.js version (#6057)
  Fix code generation for suggestion preview (#6054)
  Implementation of EnsoGL predefined Rectangle shape. (#6033)
  Tidy up the public module level statics (#6032)
  Cursor aware Component Browser (#5770)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI: No changelog needed Do not require a changelog entry for this PR. CI: Ready to merge This PR is eligible for automatic merge
Projects
None yet
5 participants